43 research outputs found

    Multi-view Metric Learning in Vector-valued Kernel Spaces

    Full text link
    We consider the problem of metric learning for multi-view data and present a novel method for learning within-view as well as between-view metrics in vector-valued kernel spaces, as a way to capture multi-modal structure of the data. We formulate two convex optimization problems to jointly learn the metric and the classifier or regressor in kernel feature spaces. An iterative three-step multi-view metric learning algorithm is derived from the optimization problems. In order to scale the computation to large training sets, a block-wise Nystr{\"o}m approximation of the multi-view kernel matrix is introduced. We justify our approach theoretically and experimentally, and show its performance on real-world datasets against relevant state-of-the-art methods

    On multi-class learning through the minimization of the confusion matrix norm

    Full text link
    In imbalanced multi-class classification problems, the misclassification rate as an error measure may not be a relevant choice. Several methods have been developed where the performance measure retained richer information than the mere misclassification rate: misclassification costs, ROC-based information, etc. Following this idea of dealing with alternate measures of performance, we propose to address imbalanced classification problems by using a new measure to be optimized: the norm of the confusion matrix. Indeed, recent results show that using the norm of the confusion matrix as an error measure can be quite interesting due to the fine-grain informations contained in the matrix, especially in the case of imbalanced classes. Our first contribution then consists in showing that optimizing criterion based on the confusion matrix gives rise to a common background for cost-sensitive methods aimed at dealing with imbalanced classes learning problems. As our second contribution, we propose an extension of a recent multi-class boosting method --- namely AdaBoost.MM --- to the imbalanced class problem, by greedily minimizing the empirical norm of the confusion matrix. A theoretical analysis of the properties of the proposed method is presented, while experimental results illustrate the behavior of the algorithm and show the relevancy of the approach compared to other methods

    Design and Implementation of a Type System for a Knowledge Representation System

    Get PDF
    A knowledge representation system ({\sc krs}) is made up of both a language to represent knowledge of a domain and well-defined reasoning facilities to infer new knowledge from known facts. This paper deals with {\sc krs}s close to frame-based systems, that include \Dstems and object-based systems. In these systems, the main relation that leads to inferences is {\em §B{}}. Knowledge terms are described through roles which refer to either other \K{} terms or data types. Subsumption between term \Ds usually interpreted as data set inclusion, where data is either a \K{} term or an external term (integer, string, etc.). Although §B{} between \K{} terms is well-defined, its \IP{} on external data depends upon the host language since there are actually the data types of the {\sc krs}. As a consequence, no {\sc krs} is able to integrate a new data type ({\em e.g.} Matrix) such that its values can be safely involved in §B{} and further inferences. This is the problem addressed in this paper. The proposed solution is the design of a polymorphic \TS{} connected to both the {\sc krs} and the host language. It is designed so that it can extend the {\sc krs} with any data type \IP{} available in the host language (library, user-coded). Meanwhile, the values of the new data type get safely involved in the {\sc krs} reasoning processes. The presented \TS{} avoids the incompleteness of §B{} due to its incomplete processing on external data

    A Protocol to Detect Local Affinities Involved in Proteins Distant Interactions

    No full text
    The tridimensional structure of a protein is constrained or stabilized by some local interactions between distant residues of the protein, such as disulfide bonds, electrostatic interactions, hydrogen links, Wan Der Waals forces, etc. The correct prediction of such contacts should be an important step towards the whole challenge of tridimensional structure prediction. The in silico prediction of the disulfide connectivity has been widely studied: most results were based on few amino-acids around bonded and non-bonded cysteines, which we call local environments of bonded residues. In order to evaluate the impact of such local information onto residue pairing, we propose a machine learning based protocol, independent from the type of contact, to detect affinities between local environments which would contribute to residues pairing. This protocol requires that learning methods are able to learn from examples corrupted by class-conditional classification noise. To this end, we propose an adapted version of the perceptron algorithm. Finally, we experiment our protocol with this algorithm on proteins that feature disulfide or salt bridges. The results show that local environments contribute to the formation of salt bridges. As a by-product, these results prove the relevance of our protocol. However, results on disulfide bridges are not significantly positive. There can be two explanations: the class of linear functions used by the perceptron algorithm is not enough expressive to detect this information, or cysteines local environments do not contribute significantly to residues pairing

    Cross-view kernel transfer

    Full text link
    We consider the kernel completion problem with the presence of multiple views in the data. In this context the data samples can be fully missing in some views, creating missing columns and rows to the kernel matrices that are calculated individually for each view. We propose to solve the problem of completing the kernel matrices with Cross-View Kernel Transfer (CVKT) procedure, in which the features of the other views are transformed to represent the view under consideration. The transformations are learned with kernel alignment to the known part of the kernel matrix, allowing for finding generalizable structures in the kernel matrix under completion. Its missing values can then be predicted with the data available in other views. We illustrate the benefits of our approach with simulated data, multivariate digits dataset and multi-view dataset on gesture classification, as well as with real biological datasets from studies of pattern formation in early \textit{Drosophila melanogaster} embryogenesis

    Apports de la modélisation algébrique pour la représentation de connaissances par objets : illustration en AROM

    Get PDF
    National audienceAROM est un système de représentation de connaissances reposant, à l'image des diagrammes de classes d'UML, sur deux types d'entités de modélisation complémentaires : les classes et les associations. Il intègre un langage de modélisation algébrique (ou LMA) qui sert de support à différents mécanismes d'inférence. Ce langage permet l'écriture d'équations, de contraintes, et de requêtes, impliquant les instances des classes et des associations. La présence d'un module de types en AROM permet d'étendre l'ensemble des types (donc des valeurs et des opérateurs) supportés par le LMA. A travers la description du LMA d'AROM, cet article souligne l'apport d'un langage de modélisation algébrique pour un système de représentation de connaissances tant au niveau de la déclarativité qu'en termes des inférences possibles

    Objects, types and constraints as classification schemes (abstract)

    Get PDF
    capponi1995aInternational audienceThe notion of classification scheme is a generic model that encompasses the kind of classification performed in many knowledge representation formalisms. Classification schemes abstract from the structure of individuals and consider only a sub-categorization relationship. The product of classification schemes preserves the status of classification scheme and provides various classification algorithms which rely on the classification defined for each member of the product. Object-based representation formalisms often use heterogeneous ways of representing knowledge. In the particular case of the TROPES system, knowledge is expressed by classes, types and constraints. Here is presented the way to express types and constraints in a type description module which provides them with the simple structure of classification schemes. This mapping allows the integration into TROPES of new types and constraints together with their sub-typing relation. Afterwards, taxonomies of classes are themselves considered to be classification schemes which are products of more primitive ones. Then, this information is sufficient for classifying TROPES objects

    The Multi-Task Learning View of Multimodal Data

    No full text
    International audienceWe study the problem of learning from multiple views using kernel methods in a supervised setting. We approach this problem from a multi-task learning point of view and illustrate how to capture the interesting multimodal structure of the data using multi-task kernels. Our analysis shows that the multi-task perspective offers the flexibility to design more efficient multiple-source learning algorithms, and hence the ability to exploit multiple descriptions of the data. In particular, we formulate the multimodal learning framework using vector-valued reproducing kernel Hilbert spaces, and we derive specific multi-task kernels that can operate over multiple modalities. Finally, we analyze the vector-valued regularized least squares algorithm in this context, and demonstrate its potential in a series of experiments with a real-world multimodal data set

    Identification et Exploitation des Types dans un modèle de connaissances à objets

    No full text
    Les modèles de connaissances à objets (MCO) souffrent d'une surcharge dans l'utilisation de leur langage de représentation associé. Si ce langage a pour objectif d'être adapté à la représentation informatique d'un domaine d'application, nous montrons qu'il n'est pas pertinent de l'utiliser pour définir des structures de données, certes utiles pour la représentation du domaine, mais dépourvues de signification directe dans ce domaine (ex. une matrice dans le domaine de l'astronomie). Cette thèse propose un système de types à deux niveaux, appelé METÈO. Le premier niveau de METÈO est un langage pour l'implémentation de types abstraits de données (ADT) qui sont nécessaires à la description minimale des éléments pertinents du domaine d'application. Ainsi, METÈO libère le langage de représentation d'une tâche à laquelle il n'a pas à s'adapter. Le second niveau de METÈO traite de l'affinement des ADT opéré dans la description des objets de représentation. Nous rappelons les deux interprétations des objets de représentation: l'intension d'un objet est une tentative de description de ce que cet objet dénote dans le domaine d'application: son extension. L'équivalence généralement admise entre ces deux aspects de l'objet est une illusion, et contribue de plus à annihiler une des véritables finalités d'un modèle de connaissances: aider une caractérisation des plus précises d'un domaine d'application. Ainsi, les types du second niveau de METÈO s'attachent à la représentation et la manipulation des intensions des objets, indépendamment de leurs extensions. L'interprétation en extension des objets est effectuée par l'utilisateur, METÈO gère en interne les descriptions de ces objets alors dénuées de leur signification, et le MCO peut alors se concentrer sur la coopération entre ces deux aspects des objets, considérés non-équivalents dans cette étude. METÈO contribue ainsi à clarifier le rôle de chaque partenaire impliqué dans la construction et l'exploitation d'une base de connaissances. Plus généralement, METÈO jette un pont entre les spécificités des MCO et les techniques usuelles de programmation de structures de données manipulables. Un prototype de METÈO a été développé pour un couplage avec le MCO TROPE
    corecore